需要安裝的有
打開命令提示元,輸入:
pip install pillow
pip install pytesseract
要記得自己的安裝路徑(我的安裝路徑為:C:\Program Files(x86)\Tesseract-OCR),待會會用到。
已上都完成後,開始進入實做吧!
首先用小畫家測試一下
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe'
image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")
text = pytesseract.image_to_string(image)
print(text)
輸出結果
Hello word !
功能介紹
pytesseract.pytesseract.tesseract_cmd 為Tesseract-OCR的安裝路徑
Image.open 你所要辨識圖片的所在地
pytesseract.image_to_string 圖片轉換為文字
如果發生SyntaxError
image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")
^SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3:
truncated \UXXXXXXXX escape
記得在引號最前面加個r → (r"")
在字符串前加個r 是為了告訴編譯器這個string是個raw string,不要轉譯
image = Image.open(r"C:\Users\user\Desktop\Myimgtest\test_1.png")
接下來換個的圖片來測試
輸出結果
This translation was prepared by Lloyd Kramer. Kramer graduated from the
University of California, Berkeley, with a major in Russian. He is also a graduate of the U.S.
Navy Foreign Language School in Boulder, Colorado. While a student at Berkeley he was
president of Dobro Slovo, the Slavic language honor society. As a naval officer during World
War H he served as both interpreter and translator in Russian for the U.S. Navy. After the
war, Kramer worked for a year as an analyst in Washington, DC. Subsequent to this
assignment, he joined the staff of the Hoover Institute and Library, Stanford University,
where he helped organize and catalog the Institute's large collection of Slavic language nonv
book materials.
Mr. Kramer now resides, with his Wife Martha, in Twain Harte, California
February 23, 2000